Novel metrics for quantifying bacterial genome composition skews

نویسندگان

  • Lena M. Joesch-Cohen
  • Max Robinson
  • Neda Jabbari
  • Christopher Lausted
  • Gustavo Glusman
چکیده

We present three novel metrics for quantifying bacterial genome composition skews. Skews are asymmetries in nucleotide usage that arise as a result of mutational biases and selective constraints, particularly for energy efficiency. The first two metrics (dot product and cross product of average skew vectors) evaluate sequence and gene annotation of the genome of a single species, while the third metric (regression RMSD) discovers patterns only discernable from studying genomes of thousands of species. The three metrics can be computed for genomes not yet finished and fully annotated. We studied the genomes of 7738 bacterial species, including completed genomes and partial drafts, and identified multiple species with unusual skew parameters. A number of these outliers (i.e., Borrelia, Ehrlichia, Kinetoplastibacterium, and Phytoplasma) display similar skew patterns despite a lack of phylogenetic relation. These disparate bacterial species share lifestyle characteristics, suggesting that our novel metrics successfully capture effects on genome composition of biosynthetic constraints and of interaction with the hosts. Introduction Bacterial genomes display significant compositional biases, both in terms of G+C content and in skews (strand asymmetry in ‘T’ vs. ‘A’ and ‘G’ vs. ‘C’ usage). These biases arise from the complex interplay of differential mutation rates and multiple selective constraints ​(Morton and Morton 2007; Vetsigian and Goldenfeld 2009)​, particularly for energy efficiency ​(Chen et al. 2016)​. Bacterial chromosomes are replicated in both directions, from the origin of replication site to the terminator site; the 1 peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/176370 doi: bioRxiv preprint first posted online Aug. 15, 2017;

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ongoing evolution of strand composition in bacterial genomes.

We tried to identify the substitutions involved in the establishment of replication strand bias, which has been recognized as an important evolutionary factor in the evolution of bacterial genomes. First, we analyzed the composition asymmetry of 28 complete bacterial genomes and used it to test the possibility that asymmetric deamination of cytosine might be at the origin of the bias. The model...

متن کامل

DNA Asymmetric Strand Bias Affects the Amino Acid Composition of Mitochondrial Proteins

Variations in GC content between genomes have been extensively documented. Genomes with comparable GC contents can, however, still differ in the apportionment of the G and C nucleotides between the two DNA strands. This asymmetric strand bias is known as GC skew. Here, we have investigated the impact of differences in nucleotide skew on the amino acid composition of the encoded proteins. We com...

متن کامل

DNA Asymmetric Strand Bias Affects the Amino Acid Composition of Mitochondrial Proteins

Variations in GC content between genomes have been extensively documented. Genomes with comparable GC contents can, however, still differ in the apportionment of the G and C nucleotides between the two DNA strands. This asymmetric strand bias is known as GC skew. Here, we have investigated the impact of differences in nucleotide skew on the amino acid composition of the encoded proteins. We com...

متن کامل

Kullback Leibler divergence in complete bacterial and phage genomes

The amino acid content of the proteins encoded by a genome may predict the coding potential of that genome and may reflect lifestyle restrictions of the organism. Here, we calculated the Kullback-Leibler divergence from the mean amino acid content as a metric to compare the amino acid composition for a large set of bacterial and phage genome sequences. Using these data, we demonstrate that (i) ...

متن کامل

A novel skew analysis reveals substitution asymmetries linked to genetic code GC-biases and PolIII a-subunit isoforms

Strand biases reflect deviations from a null expectation of DNA evolution that assumes strand-symmetric substitution rates. Here, we present strong evidence that nearest-neighbour preferences are a strand-biased feature of bacterial genomes, indicating neighbour-dependent substitution asymmetries. To detect such asymmetries we introduce an alignment free index (relative abundance skews). The pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017